Statistical Machine Translation with Long Phrase Table and without Long Parallel Sentences
نویسندگان
چکیده
In this study, we paid attention to the reliability of phrase table. To make phrase table, We have been used Och’s method[3]. And this method sometimes generate completely wrong phrase table. We found that such phrase table caused by long parallel sentences. Therefore, we removed these long parallel sentences from training data. Also, we utilized general tools for statistical machine translation, such as ”Giza++”[4], ”moses”[5], and ”training-phrase-model.perl”[6]. We obtained a BLEU score of 0.2229 of the Intrinsic-JE task and 0.2393 of the Intrinsic-EJ task for our proposed method. On the other hand, we obtained a BLEU score of 0.2162 of the Intrinsic-JE task and 0.2533 of the Intrinsic-EJ task for a standard method. This means that our proposed method was effective for the Intrinsic-JE task. However, it was not effective for the Intrinsic-EJ tasks. Also, our system was average performance of all system. For example, our system was the 20th place in 34 system for Intrinsic-JE task and the 12th place in 20 system for Intrinsic-EJ task.
منابع مشابه
Statistical machine translation without long parallel sentences for training data
In this study, we paid attention to the reliability of phrase table. We have been used the phrase table using Och’s method[2]. And this method sometimes generate completely wrong phrase tables. We found that such phrase table caused by long parallel sentences. Therefore, we removed these long parallel sentences from training data. Also, we utilized general tools for statistical machine translat...
متن کاملStatistical machine translation using large j/e parallel corpus and long phrase tables
Our statistical machine translation system that uses large Japanese-English parallel sentences and long phrase tables is described. We collected 698,973 Japanese-English parallel sentences, and we used long phrase tables. Also, we utilized general tools for statistical machine translation, such as ”Giza++”[1], ”moses”[2], and ”training-phrasemodel.perl”[3]. We used these data and these tools, W...
متن کاملCollecting Bilingual Technical Terms from Patent Families of Character-Segmented Chinese Sentences and Morpheme-Segmented Japanese Sentences
In manual translation of patent documents, a technical term bilingual lexicon is inevitable for a translator to efficiently translate patent documents. Dong et al. (2015) proposed a method of generating bilingual technical term lexicon from morpheme-segmented parallel patent sentences. The proposed method estimates Japanese-Chinese translation of technical terms using the phrase translation tab...
متن کاملJoint Phrase Alignment and Extraction for Statistical Machine Translation
The phrase table, a scored list of bilingual phrases, lies at the center of phrase-based machine translation systems. We present a method to directly learn this phrase table from a parallel corpus of sentences that are not aligned at the word level. The key contribution of this work is that while previous methods have generally only modeled phrases at one level of granularity, in the proposed m...
متن کاملA Comparison of Pivot Methods for Phrase-Based Statistical Machine Translation
We compare two pivot strategies for phrase-based statistical machine translation (SMT), namely phrase translation and sentence translation. The phrase translation strategy means that we directly construct a phrase translation table (phrase-table) of the source and target language pair from two phrase-tables; one constructed from the source language and English and one constructed from English a...
متن کامل